40 research outputs found

    Energy Consumption in Compact Integer Vectors: A Study Case

    Get PDF
    [Abstract] In the field of algorithms and data structures analysis and design, most of the researchers focus only on the space/time trade-off, and little attention has been paid to energy consumption. Moreover, most of the efforts in the field of Green Computing have been devoted to hardware-related issues, being green software in its infancy. Optimizing the usage of computing resources, minimizing power consumption or increasing battery life are some of the goals of this field of research. As an attempt to address the most recent sustainability challenges, we must incorporate the energy consumption as a first-class constraint when designing new compact data structures. Thus, as a preliminary work to reach that goal, we first need to understand the factors that impact on the energy consumption and their relation with compression. In this work, we study the energy consumption required by several integer vector representations. We execute typical operations over datasets of different nature. We can see that, as commonly believed, energy consumption is highly related to the time required by the process, but not always. We analyze other parameters, such as number of instructions, number of CPU cycles, memory loads, among others.Ministerio de Ciencia, Innovación y Universidades; TIN2016-77158-C4-3-RMinisterio de Ciencia, Innovación y Universidades; RTC-2017-5908-7Xunta de Galicia (co-founded with ERDF); ED431C 2017/58Xunta de Galicia; ED431G/01Comisión Nacional de Investigación Científica y Tecnológica; 3170534

    Map algebra on raster datasets represented by compact data structures

    Get PDF
    Financiado para publicación en acceso aberto: Universidade da Coruña/CISUG[Abstract]: The increase in the size of data repositories has forced the design of new computing paradigms to be able to process large volumes of data in a reasonable amount of time. One of them is in-memory computing, which advocates storing all the data in main memory to avoid the disk I/O bottleneck. Compression is one of the key technologies for this approach. For raster data, a compact data structure, called (Formula presented.) -raster, have been recently been proposed. It compresses raster maps while still supporting fast retrieval of a given datum or a portion of the data directly from the compressed data. (Formula presented.) -raster's original work introduced several queries in which it was superior to competitors. However, to be used as the basis of an in-memory system for raster data, it is mandatory to demonstrate its efficiency when performing more complex operations such as the map algebra operators. In this work, we present the algorithms to run a set of these operators directly on (Formula presented.) -raster without a decompression procedure.This work was supported by the National Natural Science Foundation of China (Grant Nos. 31171944, 31640068), Anhui Provincial Natural Science Foundation (Grant No. 2019B319), Earmarked Fund for Anhui Science and Technology Major Project (202003b06020016). Information CITIC, Ministerio de Ciencia e Innovación, Grant/Award Numbers: PID2020-114635RB-I00; PDC2021-120917-C21; PDC2021-121239-C31; PID2019-105221RB-C41; TED2021-129245-C21; Xunta de Galicia, Grant/Award Numbers: ED431C 2021/53; IN852D 2021/3 (CO3)This work was partially supported by CITIC, CITIC is funded by the Xunta de Galicia through the collaboration agreement between the Department of Culture, Education, Vocational Training and Universities and the Galician universities for the reinforcement of the research centers of the Galician University System (CIGUS). IN852D 2021/3(CO3): partially funded by UE, (ERDF), GAIN, convocatoria Conecta COVID. GRC: ED431C 2021/53: partially funded by GAIN/Xunta de Galicia. TED2021-129245B-C21; PDC2021-121239-C31; PDC2021-120917-C21: partially funded by MCIN/AEI/10.13039/501100011033 and “NextGenerationEU”/PRTR. PID2020-114635RB-I00; PID2019-105221RB-C41: partially funded by MCIN/AEI/10.13039/501100011033. Funding for open access charge: Universidadeda Coruña/CISUG.Xunta de Galicia; ED431C 2021/53Xunta de Galicia; IN852D 2021/3 (CO3)National Natural Science Foundation of China; 31171944National Natural Science Foundation of China; 31640068Anhui Provincial Natural Science Foundation; 2019B31

    Inference of viral quasispecies with a paired de Bruijn graph

    Get PDF
    Motivation: RNA viruses exhibit a high mutation rate and thus they exist in infected cells as a population of closely related strains called viral quasispecies. The viral quasispecies assembly problem asks to characterize the quasispecies present in a sample from high-throughput sequencing data. We study the de novo version of the problem, where reference sequences of the quasispecies are not available. Current methods for assembling viral quasispecies are either based on overlap graphs or on de Bruijn graphs. Overlap graph-based methods tend to be accurate but slow, whereas de Bruijn graph-based methods are fast but less accurate. Results: We present viaDBG, which is a fast and accurate de Bruijn graph-based tool for de novo assembly of viral quasispecies. We first iteratively correct sequencing errors in the reads, which allows us to use large k-mers in the de Bruijn graph. To incorporate the paired-end information in the graph, we also adapt the paired de Bruijn graph for viral quasispecies assembly. These features enable the use of long-range information in contig construction without compromising the speed of de Bruijn graph-based approaches. Our experimental results show that viaDBG is both accurate and fast, whereas previous methods are either fast or accurate but not both. In particular, viaDBG has comparable or better accuracy than SAVAGE, while being at least nine times faster. Furthermore, the speed of viaDBG is comparable to PEHaplo but viaDBG is able to retrieve also low abundance quasispecies, which are often missed by PEHaplo.Peer reviewe

    Exploiting Computation-Friendly Graph Compression Methods for Adjacency-Matrix Multiplication

    Get PDF
    [Abstract] Computing the product of the (binary) adjacency matrix of a large graph with a real-valued vector is an important operation that lies at the heart of various graph analysis tasks, such as computing PageRank. In this paper we show that some well-known Web and social graph compression formats are computation-friendly, in the sense that they allow boosting the computation. In particular, we show that the format of Boldi and Vigna allows computing the product in time proportional to the compressed graph size. Our experimental results show speedups of at least 2 on graphs that were compressed at least 5 times with respect to the original. We show that other successful graph compression formats enjoy this property as well.Fundação para a Ciência e a Tecnologia (Portugal); UID/CEC/50021/2013Academy of Finland; 268324Fondo Nacional de Desarrollo Científico y Tecnológico (Chile); 1171058Ministerio de Economía y Competitividad; TIN2016-77158-C4-3-RXunta de Galicia; ED431C 2017/58Xunta de Galicia; ED431G/01Agencia Nacional de Investigación y Desarrollo (Chile); ICM/FIC RC13000

    Take one for the team: on the time efficiency of application-level buffer-aided relaying in edge cloud communication

    Get PDF
    [Abstract] Background Adding buffers to networks is part of the fundamental advance in data communication. Since edge cloud computing is based on the heterogeneous collaboration network model in a federated environment, it is natural to consider buffer-aided data communication for edge cloud applications. However, the existing studies generally pursue the beneficial features of buffering at a cost of time, not to mention that many investigations are focused on lower-layer data packets rather than application-level communication transactions. Aims Driven by our argument against the claim that buffers “can introduce additional delay to the communication between the source and destination”, this research aims to investigate whether or not (and if yes, to what extent) the application-level buffering mechanism can improve the time efficiency in edge-cloud data transmissions. Method To collect empirical evidence for the theoretical discussion, we built up a testbed to simulate a remote health monitoring system, and conducted both experimental and modeling investigations into the first-in-first-served (FIFS) and buffer-aided data transmissions at a relay node in the system. Results An empirical inequality system is established for revealing the time efficiency of buffer-aided edge cloud communication. For example, given the reference of transmitting the 11th data entity in the FIFS manner, the inequality system suggests buffering up to 50 data entities into one transmission transaction on our testbed. Conclusions Despite the trade-off benefits (e.g., energy efficiency and fault tolerance) of buffering data, our investigation argues that the buffering mechanism can also speed up data transmission under certain circumstances, and thus it would be worth taking data buffering into account when designing and developing edge cloud applications even in the time-critical context.Chilean National Research and Development Agency; 11180905Ministerio de Ciencia e Innovación de España e European Regional Development Fund; RTC-2017-5908-7Ministerio de Ciencia e Innovación de España e European Regional Development Fund; PID2019-105221RB-C41Xunta de Galicia e European Regional Development Fund; ED431C 2017/58Xunta de Galicia e European Regional Development Fund; ED431G 2019/0

    Compressed Data Structures for Binary Relations in Practice

    Get PDF
    [Abstract] Binary relations are commonly used in Computer Science for modeling data. In addition to classical representations using matrices or lists, some compressed data structures have recently been proposed to represent binary relations in compact space, such as the k 2 -tree and the Binary Relation Wavelet Tree (BRWT). Knowing their storage needs, supported operations and time performance is key for enabling an appropriate choice of data representation given a domain or application, its data distribution and typical operations that are computed over the data. In this work, we present an empirical comparison among several compressed representations for binary relations. We analyze their space usage and the speed of their operations using different (synthetic and real) data distributions. We include both neighborhood and set operations, also proposing algorithms for set operations for the BRWT, which were not presented before in the literature. We conclude that there is not a clear choice that outperforms the rest, but we give some recommendations of usage of each compact representation depending on the data distribution and types of operations performed over the data. We also include a scalability study of the data representations.Ministerio de Ciencia, Innovación y Universidades; TIN2016-77158-C4-3-RMinisterio de Ciencia, Innovación y Universidades; TIN2016-78011-C4-1-RMinisterio de Ciencia, Innovación y Universidades; RTC-2017-5908-7Consellería de Economía e Industria; IN852A 2018/14Xunta de Galicia; ED431C 2017/58Xunta de Galicia co-funded with ERDF; ED431G/01University of Bío-Bío; 192119 2/RUniversity of Bío-Bío; 195119 GI/V

    Graph Compression for Adjacency-Matrix Multiplication

    Get PDF
    19 April 2022 A Correction to this paper has been published: https://doi.org/10.1007/s42979-022-01141-w[Abstract] Computing the product of the (binary) adjacency matrix of a large graph with a real-valued vector is an important operation that lies at the heart of various graph analysis tasks, such as computing PageRank. In this paper, we show that some well-known webgraph and social graph compression formats are computation-friendly, in the sense that they allow boosting the computation. We focus on the compressed representations of (a) Boldi and Vigna and (b) Hernández and Navarro, and show that the product computation can be conducted in time proportional to the compressed graph size. Our experimental results show speedups of at least 2 on graphs that were compressed at least 5 times with respect to the original.We thank Cecilia Hernández for providing us with her software extracting the bicliques, and a helpful description in how to run it. This research has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie [grant agreement No 690941], namely while the first author was visiting the University of Chile, and while the second author was affiliated with the University of Helsinki and visiting the University of A Coruña. The first author was funded by Fundação para a Ciência e a Tecnologia (FCT) [grant number UIDB/50021/2020 and PTDC/CCI-BIO/29676/2017]; the second author was funded by the Academy of Finland [Grant number 268324], Fondecyt [Grant number 1171058] and NSERC [Grant number RGPIN-07185-2020]; the third author was funded by JSPS KAKENHI [grant numbers JP21K17701 and JP21H05847]; the fourth author was funded by AEI and Ministerio de Ciencia e Innovación (PGE and FEDER) [grant number PID2019-105221RB-C41] and Xunta de Galicia (co-funded with FEDER) [Grant numbers ED431C 2021/53 and ED431G 2019/01]; and the fifth author was funded by ANID – Millennium Science Initiative Program – Code ICN17_002Xunta de Galicia; ED431C 2021/53Xunta de Galicia; ED431G 2019/0

    Compact and indexed representation for LiDAR point clouds

    Get PDF
    [Abstract]: LiDAR devices are capable of acquiring clouds of 3D points reflecting any object around them, and adding additional attributes to each point such as color, position, time, etc. LiDAR datasets are usually large, and compressed data formats (e.g. LAZ) have been proposed over the years. These formats are capable of transparently decompressing portions of the data, but they are not focused on solving general queries over the data. In contrast to that traditional approach, a new recent research line focuses on designing data structures that combine compression and indexation, allowing directly querying the compressed data. Compression is used to fit the data structure in main memory all the time, thus getting rid of disk accesses, and indexation is used to query the compressed data as fast as querying the uncompressed data. In this paper, we present the first data structure capable of losslessly compressing point clouds that have attributes and jointly indexing all three dimensions of space and attribute values. Our method is able to run range queries and attribute queries up to 100 times faster than previous methods.Secretara Xeral de Universidades; [ED431G 2019/01]Ministerio de Ciencia e Innovacion; [PID2020-114635RB-I00]Ministerio de Ciencia e Innovacion; [PDC2021-120917C21]Ministerio de Ciencia e Innovación; [PDC2021-121239-C31]Ministerio de Ciencia e Innovación; [PID2019-105221RB-C41]Xunta de Galicia; [ED431C 2021/53]Xunta de Galicia; [IG240.2020.1.185
    corecore